home *** CD-ROM | disk | FTP | other *** search
- ZCatalog Tutorial
-
- This document provides a tutorial for 'ZCatalog', the new search
- engine machinery in Zope. The audience for the document is content
- managers.
-
- Contents
-
- o What is it? What's it for? Why's it so cool?
-
- o Installing ZCatalog
-
- o ZCatalog Objects
-
- o Example using ZCatalog
-
- o Creating Search Forms And Result Reports
-
- o Using ZCatalog In A Zope Site
-
- o ZCatalog vs. Catalog
-
- What is it? What's it for? Why's it so cool?
-
- The 'ZCatalog' provides powerful indexing and searching on a Zope
- database using a Zope management interface. A 'ZCatalog' is a
- Zope object that can be added to a Folder, managed through the
- web, and extended in many ways.
-
- The 'ZCatalog' is a very significant project, providing a number
- of compelling features:
-
- o **Searches are fast**. The data structures used by the index
- provide extremely quick searches without consuming much memory.
-
- o **Searches are robust**. The 'ZCatalog' supports boolean
- search terms, proximity searches, synonyms and stopwords.
-
- o **Indexing is wildly flexible**. A 'ZCatalog' can catalog
- custom properties and track unique values. Since 'ZCatalog'
- catalogs objects instead of file handles, you can index any
- content that can have a Python object wrapped around it. This
- also lets objects participate in how they are cataloged,
- e.g. de-HTML-ifying contents or extracting PDF properties.
-
- o **Usable outside of Zope**. The software is broken into a
- Python 'Catalog' which wrapped by a 'ZCatalog'. The Python
- 'Catalog' can be used in any Python program; all it requires is
- the Z object database and the indexing machinery from Zope.
-
- o **Transactional**. An indexing operation is part of a Zope
- transaction. If something goes wrong after content is indexed,
- the index is restored to its previous condition. This also means
- that Undo will restore an index to its previous condition.
- Finally, a 'ZCatalog' can be altered privately in a Version,
- meaning no one else can see the changes to the index.
-
- o **Cache-friendly**. The index is internally broken into
- different "buckets", with each bucket being a separate Zope
- database object. Thus, only the part of the index that is needed
- is loaded into memory. Alternatively, an un-needed part of the
- index can be removed from memory.
-
- o **Results are lazy**. A search that returns a tremendous
- number of matches won't return a large result set. Only the
- part of the results, such as the second batch of twenty, are
- returned.
-
- The 'ZCatalog' is a free, Open Source part of the Zope software
- repository and thus is covered under the same license as Zope. It
- is being developed in conjunction with the Zope Portal Toolkit
- effort. However, the 'ZCatalog' product is managed as its own
- module in CVS.
-
- Installing ZCatalog
-
- 'ZCatalog' can be downloaded from the Zope download area and is
- also a module in the public CVS for Zope. Untar it while in the
- root directory of your Zope installation::
-
- $ cd Zope-2.0.0a3-src/
- $ tar xzf ../ZCatalog-x.x.tgz
-
- Windows users can use WinZip or a similar utility to accomplish
- the same thing.
-
- Also, Zope 2.0.0a3 does not have the latest version of UnIndex and
- UnTextIndex which fix a couple of bugs in the alpha 3 versions.
- The latest CVS of the SearchIndex packages *must* be used.
-
- Remember, you have to restart your Zope server before you will see
- 'ZCatalog'.
-
- ZCatalog Objects
-
- A 'ZCatalog' performs two activities: indexing information and
- performing searches.
-
- Most the work is done in the first step, which is getting objects
- into the index. This is done in two ways. First, if your objects
- are ZCatalog-aware they automatically update the index when they
- are added, edited or directly deleted. A ZCatalog-aware object is
- one that is an instance of a 'Z Class' that informs the 'ZCatalog'
- of changes. *Directly deleted* means the object was deleted from
- a Folder, not the deletion of a containing Folder.
-
- The second way that site contents get updated is by "finding"
- information "into" the 'ZCatalog'. An operation based on Zope's
- Find view traverses Folders looking for objects matching the
- criterion. The objects are then registered with the Catalog.
- Objects in the index but no longer in the site are removed from
- the Catalog.
-
- Either way -- automatically updating or walking the Folders --
- 'ZCatalog' indexes the objects it finds. The 'ZCatalog' is set up
- to look for properties, each of which are added to the index.
-
- There are two kinds of indexes, called FieldIndex and TextIndex.
- FieldIndex indexes treat data atomically. The entire contents of a
- FieldIndex-indexed property is treated as a unit. With a
- TextIndex index, it is broken into words which are indexed
- individually. A TextIndex is also known as *full-text index*.
-
- Note that the 'ZCatalog' doesn't track ZCatalog-unaware objects
- after it has indexed them. This means that the 'ZCatalog' must
- reindex its objects occasionally when the objects have been
- changed. Out of date indexes can be prevented by inheriting from
- a ZCatalog-aware class which can tell the 'ZCatalog' to reindex it
- whenever a change is made. Just such a class will be included
- with the Portal toolkit.
-
- ZCatalogs are "searchable objects", meaning they cooperate with Z
- Search Interfaces documented in Z SQL Methods. Creating a search
- form for a 'ZCatalog' is a simple matter of adding a Z Search
- Interface from the management screen and filling in a form.
- ZCatalogs can also be queried directly from DTML, as shown in the
- example below.
-
- Example using Z Classes
-
- The first example shows how to give your Zope site a long-desired
- feature: full text-searches of your content. The example assumes
- you already have a number of DTML Methods/Documents to catalog.
-
- o Install 'ZCatalog' as instructed above
-
- o In the root folder of your Zope server, add a 'ZCatalog'.
-
- o Type in the id 'catalog' and hit 'Add'.
-
- You now have a brand new 'ZCatalog' named 'catalog' in your root
- folder.
-
- o Click on it.
-
- Now you are looking at the 'ZCatalog' 'Contents' view. It says
- the catalog is empty. We'll catalog some objects in a moment, but
- first we have to tell it what portions of objects we are
- specifically interested in.
-
- o Click on 'Indexes'.
-
- This management view is where the attributes to be indexed are
- defined.
-
- o In the 'Add index' field, type 'raw'.
-
- o Click 'Add'.
-
- Now that the indexes are defined, a set of objects can be selected
- for cataloging.
-
- o Click on 'Find items to ZCatalog'.
-
- For this example, we are only interested in DTML Documents and
- Methods.
-
- o Deselect 'All type'.
-
- o Select 'DTML Method' and 'DTML Document'.
-
- o Click 'Find'.
-
- ZCatalog will report how many items it found, and then present an
- interface for excluding specific objects.
-
- o Click 'Catalog Items'.
-
- Great, now that the catalog is stocked, we can create a user
- interface to it.
-
- o Return to the root folder's management view.
-
- o Add a 'Z Search Interface'.
-
- 'ZCatalog' participates in the Zope Search architecture. You
- simply have to fill in this form, and a basic user interface will
- be created.
-
- o Select 'catalog' in the list beside 'Select one or more searchable
- objects'.
-
- o Beside 'Report Id', type 'report'.
-
- o Beside 'Search Input Id', type 'search'.
-
- 'report' and 'search' are the Ids of two DTML Methods which will
- be created in your root folder.
-
- o Click 'Add'.
-
- Congratulations, if all has gone well, you can now find references
- to any word in your DTML pages. Try it by viewing 'search'. Type
- a common word in the 'Raw' field, and you should be presented with
- a list of hits. However, none of the results returned can be
- clicked on. To fix this, go to the management view of 'report'.
- 'report' is called by 'search' to display the results from
- 'catalog'. 'report' is just a simple '<!--#in catalog-->' loop
- with a few refinements. 'catalog' knows which results to return
- by looking at the REQUEST variable, which contains the input from
- the 'search' form.
-
- o In the source of 'report', find the following line::
-
- <tr><!--#var title--></tr>
-
- o Replace it with this::
-
- <tr>
- <a href="<!--#var "catalog.getpath(data_record_id_)"-->">
- <!--#var title-->
- </a>
- </tr>
-
- This is a little confusing at first. Keep in mind that ZCatalog
- does not return a list of your database objects. What it returns
- are actually fairly unintelligent instances of a Record subclass.
- These record objects contain copies of data from attributes of
- catalog objects. The 'ZCatalog' 'MetaData Table' view defines
- which attributes are copied.
-
- (By default, these record objects are just SLIGHTLY more
- intelligent than a raw tuple. 'Catalog' can be told to use a
- custom, intelligent class for results. Please see the 'Catalog'
- __init__ method in 'lib/python/Products/ZCatalog/Catalog.py' for
- more information.)
-
- Fortunately, ZCatalog provides a utility function for going from
- result objects to the object's path. It is called, aptly enough,
- 'getpath'. 'getpath' expects to be passed the unique integer
- identifier of the cataloged object. Results store that id as
- 'data_record_id_'.
-
- Commit this change, and perform another search. Now the title can
- be clicked on to take you to the full page.
-
- Example cataloging custom objects
-
- As if full-text searches of your entire site weren't good enough,
- ZCatalog can also catalog Z Classes, Products, and in fact any
- Python object you can put in a ZODB. Here is an example using a Z
- Class, but the principles apply to any kind of object.
-
- First, we're going to need something to catalog. Follow the 'Z
- Class' tutorial to create the CD 'Z Class'. Back? Good.
-
- o Create a folder, 'CDs', and create a number of instances of
- the CD Z Class in it.
-
- 'cd1' through 'cd5' should be plenty. Remember to fill each of
- them in from their Properties view.
-
- Now we want to create a searchable catalog of CDs.
-
- o Go to the 'CDs' folder and create a 'ZCatalog' with an ID 'cd_cat'.
-
- o Click on the objects Indexes view.
-
- This screen shows that, by default, 'ZCatalog' is interested in an
- object's 'id','title', 'meta_type', and
- 'bobobase_modification_time'. You will almost always want to
- index additional information. In this case, we would also like to
- index the artist and description of CDs.
-
- o Type 'artist' into the 'Add Index' field.
-
- For the sake of example, we're going to use a FieldIndex index for
- artist. This will give us the option of putting an HTML SELECT
- box for artists on the search form.
-
- o Select FieldIndex from the Index type drop down, and click
- 'Add'.
-
- o Also add an index for 'description', but leave TextIndex
- selected.
-
- This will allow us to search for individual words within the
- description.
-
- o Click on 'MetaData Table'.
-
- This is where we tell the 'ZCatalog' what attributes of cataloged
- objects to cache. These cached values are available from search
- results without having to look up the actual indexed object. The
- tradeoff for the speed is extra memory, as information from the
- content is duplicated in the 'ZCatalog'.
-
- You will probably want to keep the schema light-weight, so we're
- not going to add 'description' to it. Type 'artist' in the 'Add
- column' field and click 'Add'.
-
- o Click on the 'Find Items to Catalog' view.
-
- This is the interface you use to tell the 'ZCatalog' which items
- to index. Right now, beside 'Find objects of type:', 'All types'
- is selected.
-
- o Deselect 'All types'.
-
- O Scroll down and select CD.
-
- You could use the rest of the form to be more specific, but since
- we want to catalog all the CDs,
-
- o Click 'Find'.
-
- 'ZCatalog' will report 'Found 5 items.' It is now giving you an
- opportunity to exclude some of the matched items from the index.
- Again, we want all of them, so,
-
- o Click 'Catalog Items'.
-
- You should at this point see a list of the indexed objects. Also
- of note is the 'Update Catalog' button. You have to use it
- whenever you want your 'ZCatalog' to notice changes you've made to
- the objects it's indexed.
-
- Creating Search Forms And Result Reports
-
- This catalog isn't much good without some way of querying it.
-
- o Go back to your 'CDs' folder's management screen and add a Z
- Search Interface.
-
- The search add form will automatically detect your cd_cat
- 'ZCatalog' and offer it as a searchable document. Make sure it is
- selected.
-
- o Fill in 'cd_report' for 'Report ID' and 'cd_search' for
- 'Search Input ID'.
-
- Those are the ids of two DTML methods that will be generated in
- the 'CDs' folder.
-
- o Click 'Add'.
-
- o View the 'cd_search' Catalog (at, for example,
- http://localhost:9673/CDs/cd_search).
-
- You will see a basic search interface, with fields for searching
- on 'title', modification date, 'id', 'artist', 'meta type' and
- 'description'. If you fill in one more more of the fields and
- click 'Submit Query', cd_report will be displayed. It is passed
- the search criteria and uses it to get a list from cd_cat to
- iterate over. It is merely displaying the information from the
- ZCatalog's MetaData table, but of course it can be enriched.
-
- Try a few more searches. You'll find that you can type any single
- word from the title or description and get a match, but for artist
- you must type the exact string. That's because artist was indexed
- as a FieldIndex, which gives us an opportunity to present a more
- convenient interface.
-
- Go back to the 'cd_search' management interface, and change
- it's source to look like this::
-
- <xmp>
- <!--#var standard_html_header-->
- <form action="cd_report" method="get">
- <h2><!--#var document_title--></h2>
- Enter query parameters:<br><table>
- <tr><th>Title</th>
- <td><input name="title"
- width=30 value=""></td></tr>
- <tr><th>Artist</th>
- <td>
- <select name="artist">
- <option value="">All</option>
- <!--#in expr="cd_cat.uniqueValuesFor('artist')"-->
- <option value="<!--#var sequence-item-->">
- <!--#var sequence-item-->
- </option>
- <!--#/in-->
- </select>
- </td>
- </tr>
- <tr><th>Description</th>
- <td><input name="description"
- width=30 value=""></td></tr>
- <tr><td colspan=2 align=center>
- <input type="SUBMIT" name="SUBMIT" value="Submit Query">
- </td></tr>
- </table>
- </form>
- <!--#var standard_html_footer-->
- </xmp>
-
- This is a search form somewhat more appropriate for the CD 'Z
- Class'. Unrelated fields have been removed, and the 'artist'
- field has been changed to a drop-down menu. Let's augment the
- output of 'cd_report' to make the title a link to the actual CD
- object.
-
- Taking a look at 'cd_report', note that the search results are
- obtained with a simple '<!--#in cd_cat ...-->' tag. The search
- criteria is automatically obtained by the 'ZCatalog' from the form
- input. The line we're interested in is this one::
-
- <td><!--#var title--></td>
-
- Change it to read::
-
- <td>
- <a href="<!--#var "cd_cat.getpath(data_record_id_)"-->">
- <!--#var title-->
- </a>
- </td>
-
- Now, assuming you have added the index_html document template to
- your CD 'Z Class', clicking on a search result will take you to
- the CD's detailed display.
-
- Using 'ZCatalog' In A Zope Site
-
- The 'ZCatalog' provides high-speed access to what is on your site.
- Thus, the 'ZCatalog' can be used to re-engineer the way your site
- is laid out.
-
- For instance, a Slashdot-style presentation is simple. Just
- insert some DTML that asks the 'ZCatalog' for recent items.
- Alternatively, a Site Map is nothing more than presenting the
- contents of the catalog. A page with tree-based browsing of
- software packages by category is also easy. Perhaps you'd like to
- provide a link that lists all the packages the current user has
- authored.
-
- Thus, the 'ZCatalog' isn't just about searching. It can be used
- as the DTML-scriptable engine for browsing a site as well.
-
- Since the 'ZCatalog' is a normal Zope folderish object, you can
- also create DTML Methods inside it to present the catalog
- contents. For instance, perhaps you'd like to dump the contents
- of the site as an RDF stream, or do content syndication with RSS.
- These are just DTML Methods that change the 'Content-Type:' and
- send back XML. All without actually waking up any of the content
- objects in the site.
-
- ZCatalog vs. Catalog
-
- The real star of this package is the 'Catalog' module. All the
- heavy lifting is done by 'Catalog'. 'ZCatalog' is basically a
- Zope-aware wrapper around Catalog, which can be used on it's own
- outside the Zope framework. The only requirement is that you are
- using ZODB as your object store.
-